Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Chinese speech segmentation method based on Gauss distribution of time spans of syllables
ZHANG Yang, ZHAO Xiaoqun, WANG Digang
Journal of Computer Applications    2016, 36 (5): 1410-1414.   DOI: 10.11772/j.issn.1001-9081.2016.05.1410
Abstract676)      PDF (957KB)(349)       Save
So far away, there is no accurate method for Chinese natural speech segmentation of syllables,which is meaningful in labeling speech with reference text instead of people. According to two hypotheses that time spans of Chinese syllables under the same pronunciation obey Gauss distribution and short-time energy valley exists between two adjacent syllables, Chinese speech segmentation method based on Gauss distribution of time spans of syllables was proposed. A simplified method based on distribution of energy valleys was given, which effectively reduced the time complexity of this speech segmentation method. The experimental results show that segmentation accuracy (mean square value of time spans between artificial labels and labels created by this method) achieve 10 -3 and computing times are less than 1 s in Matlab of PC.
Reference | Related Articles | Metrics
Chinese speech segmentation into syllables based on energies in different times and frequencies
ZHANG Yang, ZHAO Xiaoqun, WANG Digang
Journal of Computer Applications    2016, 36 (11): 3222-3228.   DOI: 10.11772/j.issn.1001-9081.2016.11.3222
Abstract609)      PDF (1015KB)(478)       Save
Precise speech segmentation methods, which can also greatly improve the efficiency of corpus annotation works, are helpful in comparing voice with voice models in speech recognition. A new Chinese speech segmentation into syllables based on the feature of time-frequency-dimensional energy was proposed:firstly, silence frames were searched in traditional way; secondly, unvoiced frames were sought using the difference of energies in different frequencies; thirdly, the voiced frames and speech frames were looked for with the help of 0-1 energies in special frequency ranges; finally, syllable positions were given depending on the judgements above. The experimental results show that the proposed method whose syllable error is 0.0297 s and syllable deviation is 7.93% is superior to Merging-Based Syllable Detection Automaton (MBSDA) and method of Gauss fitting.
Reference | Related Articles | Metrics